Julia vs. Other Technical Computing Languages

Why is vectorized code fast ? And, why is not as fast as it could be ?

Generators, Comprehensions, Reduction, Broadcasting, Syntactic Loop Fusion, ...

Author: Ján Dolinský, Tangent Works

Generators

Example 1: Evaluating a polynomial

Matlab/R Vectorized Way


In [ ]:
using BenchmarkTools

In [ ]:
a, f = rand(10^5), rand(10^5);

In [ ]:
sizeof(a)

In [ ]:
poly_R(x) = 3x.^2 + 2x - sqrt.(x) - 1

In [ ]:
@btime poly_R(a);

Conclusion:
7 * 800_000 bytes = 5_600_000 bytes (approx. 5.34MiB)
The code in function poly_R() does 7 unnecessary allocations!

tmp1 = x.^2
tmp2 = 3*tmp1
tmp3 = 2x
tmp4 = sqrt.(x)
tmp5 = tmp2 + tmp3
tmp6 = tmp5 - tmp4
tmp7 = tmp6 - 1

Next: Let's rewrite poly_R() into poly() and use it as our kernel function.


In [ ]:
poly(x) = 3x^2 + 2x - x - 1

Julia: Explicit Loop


In [ ]:
function poly_J_loop(v)
    u = similar(v)
    
    for i  eachindex(v)
        u[i] = poly(v[i])
    end
    
    u
end

In [ ]:
@btime poly_J_loop(a);

Conclusion:
1 * 800_000 bytes = 800_000 bytes (approx. 781.34KiB)
Only a single allocation for target vector is made.

Julia: Generator & Comprehension


In [ ]:
poly_J(v) = [ poly(x) for x in v ]

In [ ]:
poly_J_alternative(v) = [ 3x^2 + 2x - x - 1 for x in v ]

In [ ]:
@btime poly_J(a);

Conclusion:
1 * 800_000 bytes = 800_000 bytes (approx. 781.34KiB)
Only a single allocation for target vector is made.

Julia: Map()


In [ ]:
poly_J_map(v) = map(poly, v)

In [ ]:
poly_J_map_alternative(v) = map(x -> 3x^2 +2x - x - 1, v)

In [ ]:
@btime poly_J_map(a);

Julia: Loop fusion


In [ ]:
@btime poly.(a);

Generators & Reduction Concept

Example 1: Sum of a Polynomial

  • generator: polynomial expression
  • reducer: sum

Matlab/R Vectorized Way


In [ ]:
polysum_R(v) = sum(poly_R(v))

In [ ]:
@btime polysum_R(a)

Julia way


In [ ]:
@btime sum(poly, a)

Julia way: Explicit mapreduce()


In [ ]:
@btime mapreduce(poly, +, a)

Example 2: Mean of a Polynomial

  • generator: polynomial expression
  • reducer: mean

In [ ]:
@btime mean(poly, a)

Example 3: Mean Squared Error

  • generator: squared residual
  • reducer: mean

Matlab/R Vectorized Way


In [ ]:
mse_R(a, f) = mean((a - f).^2)

In [ ]:
sizeof(a)

In [ ]:
@btime mse_R(a, f)

Conclusion:
2 * 800_000 bytes = 1_600_000 bytes (approx. 1.53MiB)

tmp1 = a - f
tmp2 = tmp1.^2

Julia way


In [ ]:
mse_J(a, f) = mean( (a[i] - f[i])^2 for i  eachindex(a) )

In [ ]:
@btime mse_J(a, f)

In [ ]:
mse_J_zip_alternative(a, f) = mean( (x - y)^2 for (x, y)  zip(a, f) )

In [ ]:
@btime mse_J_zip_alternative(a, f)

Example 4: Mean Absolute Percentage Error

  • generator: absolute percentage error
  • reducer: mean

Matlab/R way


In [ ]:
mape_R(a, f) = 100 * mean(abs.((a - f) ./ a))

In [ ]:
@btime mape_R(a, f)

In [ ]:
mape_J(a, f) = 100 * mean( abs((a[i] - f[i]) / a[i]) for i  eachindex(a) if a[i] != 0 )

In [ ]:
@btime mape_J(a, f)

In [ ]:
mape_J_zip_alternative(a, f) = 100 * mean( abs((x - y) / x) for (x, y)  zip(a, f) if x != 0 )

In [ ]:
@btime mape_J_zip_alternative(a, f)

Broadcasting

Element-by-element operation on arrays of different sizes.

Example 1: Mean correction


In [ ]:
m = [fill(1.0, 10) fill(2.0, 10) fill(3.0, 10)]

In [ ]:
μ = mean(m, 1)

In [ ]:
m .- μ

In [ ]:
broadcast((x, y) -> x - y, m , μ)

Example 2: Z-Score


In [ ]:
m = rand(10^3, 3);

In [ ]:
_zscore(x, μ, σ) = (x - μ) / σ

In [ ]:
function zscore(m)
    μ = mean(m)
    σ = std(m)
    
    _zscore.(m , μ, σ)
end

In [ ]:
@time zscore(m);

Challenge
Implement Rank-1 update and explain the problem with a Matlab/R implementation of such a problem


In [ ]:
A, u, v = ones(10^3, 10^2), fill(2.0, 10^3), fill(3.0, 10^2)

In [ ]:
sizeof(A)

In [ ]:
rank1_naive(A, u, v) = A - u*v'

In [ ]:
@time rank1_naive(A, u, v)

In [ ]:
_rank1(x, y, z) = x - y * z

In [ ]:
rank1_J_1(A, u, v) = _rank1.(A, u, v')

In [ ]:
@time rank1_J_1(A, u, v);

In [ ]:
rank1_J_2(A, u, v) = A .- u.*v'

In [ ]:
@time rank1_J_2(A, u, v);

In [ ]:
rank1_J_3(A, u, v) = broadcast(_rank1, A, u, v')

In [ ]:
@time rank1_J_3(A, u, v)